Summarizing with Wikipedia

نویسندگان

  • Abdullah Bawakid
  • Mourad Oussalah
چکیده

This paper describes a query-based multi-document summarizer that was built to participate in the update summarization task of TAC10. The system relies on a thesaurus extracted from Wikipedia and uses it as its underlying ontology. The concepts which are detected within the documents are used as weighted features to score the document sentences. The relationships previously defined in the thesaurus between the different concepts help in finding the most important concepts within a document or a set of documents. Sentences are ranked based on the scores they have been assigned and the summary is formed from the highest ranking sentences till the 100-word limit is reached. The evaluation results and the performance of the system are described. The system’s rank is the 7 in the manual evaluation of the update task for this year. The total number of the submitted runs by all participants is 43.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending DBpedia with List Structures in Wikipedia Articles

Ontologies are the basis of the Semantic Web. Owing to the cost of their construction and maintenance, however, there is much interest in automating their construction. Wikipedia is considered a promising source of knowledge because of its own characteristics. DBpedia extracts a large amount of ontological information from Wikipedia. However, DBpedia focuses exclusively on infoboxes (i.e., tabl...

متن کامل

WikiTopics: What is Popular on Wikipedia and Why

We establish a novel task in the spirit of news summarization and topic detection and tracking (TDT): daily determination of the topics newly popular with Wikipedia readers. Central to this effort is a new public dataset consisting of the hourly page view statistics of all Wikipedia articles over the last three years. We give baseline results for the tasks of: discovering individual pages of in...

متن کامل

Generating Wikipedia by Summarizing Long Sequences

We show that generating English Wikipedia articles can be approached as a multidocument summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical enc...

متن کامل

"The sum of all human knowledge": A systematic review of scholarly research on the content of Wikipedia

Wikipedia might possibly be the best-developed attempt thus far of the enduring quest to gather all human knowledge in one place. Its accomplishments in this regard have made it an irresistible point of inquiry for researchers from various fields of knowledge. A decade of research has thrown light on many aspects of the Wikipedia community, its processes, and content. However, due to the variet...

متن کامل

Simple supervised document geolocation with geodesic grids

We investigate automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evid...

متن کامل

Concept Based Tie-breaking and Maximal Marginal Relevance Retrieval in Microblog Retrieval

There are enormous tweets posted on any given day, and the number keeps increasing. As a result, the needs of effectively retrieving tweets depending upon user’s information need, and summarizing tweets pertaining to a given topic have become increasingly important. In this paper, Wikipedia concepts [1] was introduced in tie-breaking to perform ad-hoc microblog retrieval. The Maximal Marginal R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010